Extending Entity-Relationship Modeling Notation To Manage Fuzzy Datasets

نویسندگان

  • Gregory Vert
  • Ashley Morris
  • Molly Stock
  • Piotr Jankowski
چکیده

Current work in modeling has focused on fuzziness as it applies to single entities. An application of fuzzy theory may be made to managing data sets and collections, thus treating the data sets as collection of fuzzy objects. The application of fuzzy theory to management of sets creates has not been fully explored. In the area of Geospatial Information Science (GIS), one can find many types of sets, representing ambiguous overlapping spatial extents. As part of our research, a current ERD data model notation is being populated and extended with a notation representing fuzzy theory as it applies to the problem of set management. One part of this research is in the process of developing a discretinzing function D() for fuzzy problems defined by continuous field data. D() is a specialized member of the class of functions M(), where the values it selects on are spatial definitions and temporally continuous fields. We have also developed a function M spatial(), where M spatial() is a specialized member of M() pertaining to noncontinuous spatial entities. Background The University of Idaho College of Forestry manages a research project referred to as the Experimental Forest. The forest encompasses 7300 acres in multiple non-contiguous tracts[1]. Because of the high usage of this laboratory, the university began to seek ways to track and annotate developments in the forest. GIS data has become more prevalent in management of the Experimental Forest. Students use the data to learn principles of GIS and to maintain forest data. Because of the many ways that GIS data about the forest is utilized, this data has become a critical information resource for the forest’s management. Students are continually updating information about the Experimental Forest. Adding good data to bad data, or merging bad data with good data can result in an inaccurate resultant data set. Therefore, the quality of data sets must be documented and managed. If documentation is a basis for data integrity of the Experimental Forests data sets, then the creation of the relationships between documented data sets is the key to maintaining integrity. This means that relationships between data sets must be created which document the ancestry, spatial relationships and temporality of the data. A solution to the need for organization can be found through the development of a data architecture. An architecture must not only organize and coordinate data, but also must provide lineage. At the center of this architecture is the need to manage multiple heterogeneous data formats, and data about sets of data. This research has taken a datacentric approach in that it has developed a data model for managing metadata as the key underpinning of the set management architecture. With an architecture centered on data, it then becomes possible to construct a real world solution. An ERD Model To Manage GIS Data The heart of our architecture is a data model, an entity relationship diagram (ERD) that details the relationships between data. To extend this architecture to manage a variety of data formats, the data model does not attempt to manage data; rather, it seeks to manage data about data under management. Because metadata management is an abstraction, it is a particularly robust approach. Entities, Relationships and Extensions The SuperSet entity defines collections of Subsets that collectively are referred to as a project. It has a recursive relationship to other SuperSet entities instances. This relationship relates to defines the fact that multiple supersets may be related to each other. Examples of where this may occur are supersets that cover the same geographic region, or alternative views or lineages of the same project A SuperSet is composed of multiple Subsets. The relationship of Supersets to Subsets introduces the new extension of fuzzy Subsets that will be presented later. An instance of a subset entity is a single logical, metadescription of a component in a superset. It also has a recursive relationship to itself that allows one to implement versions of the subset and thus a lineage tree. Qian and Peuguet have developed a model TRIAD [3] that includes consideration for the temporal nature of GIS data. We extend this work with the Temporal Location entity. The Temporal Location entity represents a time locational definition for a data set. It has temporal attribute values and geospatial coordinates that create identifiers for a particular data set. Because of the ambiguous nature of locating an object temporally, this entity is a candidate for the application of fuzzy extensions in later sections. The Geographic Region entity locates a Subset of data for its spatial coverage. This entity works in conjunction with the Temporal Location entity to locate a set of data in time and space. Thus, the problem involves locating both geographically where the coverage is located, and temporally when it is located. The Set Type entity provides a description of the type of data a Subset may be. The types of attributes found in this entity have not been fully defined yet, but are expected to by "image", "raster", etc. This entity is also a candidate for fuzzy notation extension, following this section. The Set View entity provides a repository for the location of various views of data that may exist for a given Subset of data that has been located spatially and temporally. It contains information such as the perspective on the display of the data, the scale that the data is represented by and the type of projection by which a particular piece of data is represented. It is meant to support the requirement that multiple views of the same dataset are possible. The Location entity is an entity that describes the physical location of the Subset at a particular point in time. The existence of this entity extends the capabilities of this architecture, by providing the potential to support distributed database mechanisms, and to provide the mechanism to have alternative views of data that are not centrally located. The Raw Source entity is the only entity in this model whose instantiation represents a real object. In this case the physical data described by Subset. This entity will only contain enough information to locate and identify a file as belonging to a particular instance of subset. Model for Managing Data Sets We want to provide several capabilities for set management that needed to be reflected in the type of data model that was constructed. Specifically we want to provide: • versioning and lineages on sets of data – to view ancestry for a data set and see what changes occurred via an annotated record • multiple views of the same data set – to allow for differences in projection and scale • temporal localization – a method to determine for a point of time what datasets, especially those with overlapping coverages, would be most germane to a particular need • the ability to address and resolve ambiguities found in GIS datasets such as overlapping coverages – to resolve conflicts where several datasets could be retrieved for a particular geospatial region. The model follows a standardized ERD modeling notation used by ORACLE [2] and is shown in figure 1. Figure 1. An ERD Model for managing GIS Datasets A Fuzzy ERD Model, Notational Conventions and Extensions The data model presented thus far seeks to manage metadata about relationships between datasets. Geospatial data typically has problems of ambiguity in selection of data that can be addressed by the application of fuzzy set theory. Overlapping spatial coverages are not the only type of ambiguity that can exist. We have also noted that there can be overlapping temporal locations for spatial coverages. This does not address the questions of partial or complete overlap, nor the issues of complete overlap, but with different characteristics to the spatial coverage such as different projections, scale and data types. Considering this situation, it is clear that ambiguity on an attribute of spatial data can compound with other ambiguities about the same data. The above types of ambiguity can be a problem even in a single user environment. However, the Experimental Forest's data is accessed by multiple users, which may have different needs for overlapping coverages. The number of users should be considered in conjunction with the aforementioned ambiguous situation to comprehend a large and complex problem potential. This situation can lend itself well to the application of fuzzy principles for selection, and definition of relations between sets of data. There has been much written about fuzzy sets techniques for decision making such as Open Weighted Operators [4]. We extend our data model for metadata set management with notations representing fuzzy principles in order to create a more powerful model that can manage ambiguous data. In order to apply fuzzy theory to the problem of dataset management, one first needs to define a data model that describes the relationships between data, as defined previously. Using this model, we have identified areas where ambiguities in data can exist and developed and extended notation to apply fuzzy theory to these areas. Chen [5] defines an Extended Entity Relation (EER) model to consist of the triple M = (E, R, A) where M represents model, E represents entities, R represents relationships and A represents relationships. E, R, A are defined in [5] to have fuzzy membership functions. In particular: R = {Ur (R)/R | where R is a relationship involving entities in (E) and Ur(R) ∈ [0,1] In this case, Ur() is a fuzzy membership function on the relationship between entities in a model. Fuzzy membership functions are also defined on attributes and entities. Therefore, it is possible to have fuzzy relations on relations, without dependencies on other types of fuzzy objects in a model. Hence, we extend our data model to defining notations that describe the application of fuzzy theory to relations. Subset symbol The first new notational convention is the Subset symbol. The Subset symbol defines a new type of relationship, that of the "bag". An entity with the subset symbol defined on one of its relations is a nonunique entity, unlike most entities in an ERD model. The rationale for its existence is that multiple copies a Subset containing the same elements can exist for different temporal locations covering the same geographic region. The symbol is defined as: By its nature of being a non-unique entity, a Subset relation is also a fuzzy relation. Because Subsets are discrete, the Subset symbol occurs with the symbol for fuzzy relation M().

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Modeling Dealing With Uncertainty in Fuzzy Logic

This paper shows models of data description that incorporate uncertainty like models of data extension EER, IFO among others. These database modeling tools are compared with the pattern FuzzyEER proposed by us, which is an extension of the EER model in order to manage uncertainty with fuzzy logic in fuzzy databases. Finally, a table shows the components of EER tool with the representation of al...

متن کامل

Extending UML 1.5 for fuzzy conceptual modeling: An strictly additive approach

The Unified Modeling Language has become a widespread notation for conceptual modeling, and it currently provides a number of extensibility mechanisms to tailor it to specialized modeling issues. Some extensions have been proposed to extend the UML for fuzzy modeling, but without explicitly considering strict additivity and semantic compatibility with the original specification. In this paper, ...

متن کامل

Comparative Study on Extended Entity Relationship and Unified Modeling Language

---------------------------------------------------------------------***--------------------------------------------------------------------Abstract Entity Relationship modeling is the process of coming up with an abstract and conceptual representation of data. Entity Relationship diagrams ultimately model the databases. Entity Relationship model is generally better in relationship name and par...

متن کامل

Extending Uml Class Diagrams to Capture Additional Association Semantics

The Unified Modeling Language (UML) has gained much popularity in recent years. In UML, class diagrams provide notations for modeling the semantics of the structural relationships, or associations, that occur between objects. Object Relationship Notation (ORN) provides another declarative scheme that permits the semantics of such associations to be modeled and then defined to a Database Managem...

متن کامل

The Difficulty of Mapping Modeled Associations to SQL

Entity-Relationship and UML class diagrams allow users to model the structural relationships, i.e., associations, that occur between entities or objects. By extending these diagrams with Object Relationship Notation (ORN), users can model the semantics of a variety of common association types with improved precision. Mapping these semantics to SQL, however, is very difficult. Here, we provide a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001